Adding logic to master_is_stable indicator to check for discovery problems #88020

masseyke · 2022-06-24T14:57:46Z

This PR builds on #86524, #87482, and #87306 by supporting the case where there has been no master node in the last 30 second, no node has been elected master, and the current node is master eligible. This is branch 1.2.2.4 in the diagram at #87482 (comment).
The outline of the logic is that when we see that the master node has gone null, we start polling other master-eligible nodes for their ClusterFormationState. Once a diagnoseMasterStability() request comes in we look at the ClusterFormationStates from all of the mater nodes and the result is one of the following:

We have received an exception from one of the other master-eligible nodes (either on that node or a timeout), and return RED (1.2.2.4.1)
We realize that some nodes report that they have not discovered all of the other master-eligible nodes, and return RED (1.2.2.4.2)
We realize that some nodes report that there is no quorum, and return RED (1.2.2.4.3.1)
We realize that every node thinks there is a quorum, so some other problem is occurring, and we return RED (1.2.2.4.3.2)

Note that in this PR we are not returning all of the details described in the diagram (such as which nodes cannot discover which other nodes). Instead we're only giving the details from the local ClusterFormationState. Once we figure out how the details will be used we will add the other information in a later PR.

Here is an example response for case 1 above:

{
    "status": "red",
    "cluster_name": "TEST-TEST_WORKER_VM=[--not-gradle--]-CLUSTER_SEED=[-8835703766361244273]-HASH=[11CD8E4183A082]-cluster",
    "components": {
        "cluster_coordination": {
            "status": "red",
            "indicators": {
                "master_is_stable": {
                    "status": "red",
                    "summary": "No master node observed in the last 1s, and an exception occurred while reaching out to node_t1 for diagnosis",
                    "help_url": "https://ela.st/fix-master",
                    "details": {
                        "current_master": {
                            "node_id": null,
                            "name": null
                        },
                        "recent_masters": [
                            {
                                "node_id": "EuJQ4HDWQRSSTPFRd9MkGw",
                                "name": "node_t0"
                            }
                        ],
                        "exception_fetching_history": {
                            "message": "Artificial failure",
                            "stack_trace": "java.lang.RuntimeException: Artificial failure\n\tat org.elasticsearch.cluster.coordination.CoordinationDiagnosticsService$ClusterFormationStateOrException.<init>(CoordinationDiagnosticsService.java:640)\n\tat org.elasticsearch.cluster.coordination.CoordinationDiagnosticsService$1$1.onResponse(CoordinationDiagnosticsService.java:604)\n\tat org.elasticsearch.cluster.coordination.CoordinationDiagnosticsService$1$1.onResponse(CoordinationDiagnosticsService.java:592)\n\tat org.elasticsearch.action.ActionListener$RunBeforeActionListener.onResponse(ActionListener.java:415)\n\tat org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:43)\n\tat org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1337)\n\tat org.elasticsearch.transport.TransportService$DirectResponseChannel.processResponse(TransportService.java:1422)\n\tat org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1402)\n\tat org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:41)\n\tat org.elasticsearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:39)\n\tat org.elasticsearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:20)\n\tat org.elasticsearch.action.admin.cluster.coordination.ClusterFormationInfoAction$TransportAction.doExecute(ClusterFormationInfoAction.java:137)\n\tat org.elasticsearch.action.admin.cluster.coordination.ClusterFormationInfoAction$TransportAction.doExecute(ClusterFormationInfoAction.java:120)\n\tat org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:79)\n\tat org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:54)\n\tat org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:71)\n\tat org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:67)\n\tat org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:67)\n\tat org.elasticsearch.transport.TransportService.sendLocalRequest(TransportService.java:908)\n\tat org.elasticsearch.transport.TransportService$3.sendRequest(TransportService.java:123)\n\tat org.elasticsearch.transport.TransportService.sendRequestInternal(TransportService.java:848)\n\tat org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:737)\n\tat org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:683)\n\tat org.elasticsearch.cluster.coordination.CoordinationDiagnosticsService$1.onResponse(CoordinationDiagnosticsService.java:587)\n\tat org.elasticsearch.cluster.coordination.CoordinationDiagnosticsService$1.onResponse(CoordinationDiagnosticsService.java:581)\n\tat org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:411)\n\tat org.elasticsearch.cluster.coordination.CoordinationDiagnosticsService.lambda$beginPollingClusterFormationInfo$3(CoordinationDiagnosticsService.java:577)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)\n\tat java.base/java.util.concurrent.FutureTask.runAndReset$$$capture(FutureTask.java:305)\n\tat java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java)\n\tat java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)\n\tat java.base/java.lang.Thread.run(Thread.java:833)\n"
                        },
                        "cluster_formation": {
                            "node1": "master not discovered or elected yet, an election requires at least 2 nodes with ids from [EuJQ4HDWQRSSTPFRd9MkGw, b-6204euRty75T3FWMuykA, 6EBYghrnQrWE8yiwTdlGTg], have discovered possible quorum [{node_t1}{6EBYghrnQrWE8yiwTdlGTg}{euqgRAHyRxG0IJWfARa9Hw}{node_t1}{127.0.0.1}{127.0.0.1:13301}{cdfhilmrstw}, {node_t2}{b-6204euRty75T3FWMuykA}{u7NSjyF3RfmT1rCRD-F7Wg}{node_t2}{127.0.0.1}{127.0.0.1:13303}{cdfhilmrstw}, {node_t0}{EuJQ4HDWQRSSTPFRd9MkGw}{nvCD3YwyRDu3tFN1_7JLlA}{node_t0}{127.0.0.1}{127.0.0.1:13302}{cdfhilmrstw}]; discovery will continue using [127.0.0.1:13302, 127.0.0.1:13303] from hosts providers and [{node_t2}{b-6204euRty75T3FWMuykA}{u7NSjyF3RfmT1rCRD-F7Wg}{node_t2}{127.0.0.1}{127.0.0.1:13303}{cdfhilmrstw}, {node_t0}{EuJQ4HDWQRSSTPFRd9MkGw}{nvCD3YwyRDu3tFN1_7JLlA}{node_t0}{127.0.0.1}{127.0.0.1:13302}{cdfhilmrstw}, {node_t1}{6EBYghrnQrWE8yiwTdlGTg}{euqgRAHyRxG0IJWfARa9Hw}{node_t1}{127.0.0.1}{127.0.0.1:13301}{cdfhilmrstw}] from last-known cluster state; node term 1, last-accepted version 3 in term 1",
                            "node2": "master not discovered or elected yet...",
                            "node3": "master not discovered or elected yet..."
                        }
                    },
                    "impacts": [...],
                    "user_actions": [...]
                }
            }
        },
        "data": {...},
        "snapshot": {...}
    }
}

And an example from case 2:

{
    "status": "red",
    "cluster_name": "TEST-TEST_WORKER_VM=[--not-gradle--]-CLUSTER_SEED=[-6341988252367058471]-HASH=[11CC4EF6482AF0]-cluster",
    "components": {
        "cluster_coordination": {
            "status": "red",
            "indicators": {
                "master_is_stable": {
                    "status": "red",
                    "summary": "No master node observed in the last 30s, and some master eligible nodes are unable to discover other master eligible nodes",
                    "help_url": "https://ela.st/fix-master",
                    "details": {
                        "current_master": {
                            "node_id": null,
                            "name": null
                        },
                        "recent_masters": [
                            {
                                "node_id": "LBrijWVtTiWZm03kHDE1iw",
                                "name": "node_t2"
                            }
                        ],
                        "cluster_formation": {
                            "node1": "master not discovered or elected yet, an election requires at least 2 nodes with ids from [R25HgyJkTVmfr2j7Lv3F_Q, LBrijWVtTiWZm03kHDE1iw, okA4QNE9R0aTvTCEKPinuA], have discovered possible quorum [{node_t1}{R25HgyJkTVmfr2j7Lv3F_Q}{wm2Tz70qS4yDpR1N8otm_A}{node_t1}{127.0.0.1}{127.0.0.1:13302}{cdfhilmrstw}, {node_t0}{okA4QNE9R0aTvTCEKPinuA}{3nNebXXZQ5CCWd61EM4iww}{node_t0}{127.0.0.1}{127.0.0.1:13301}{cdfhilmrstw}, {node_t2}{LBrijWVtTiWZm03kHDE1iw}{Y-h0Er4NSVuS0XPIskjsHA}{node_t2}{127.0.0.1}{127.0.0.1:13303}{cdfhilmrstw}]; discovery will continue using [127.0.0.1:13301, 127.0.0.1:13303] from hosts providers and [{node_t0}{okA4QNE9R0aTvTCEKPinuA}{3nNebXXZQ5CCWd61EM4iww}{node_t0}{127.0.0.1}{127.0.0.1:13301}{cdfhilmrstw}, {node_t2}{LBrijWVtTiWZm03kHDE1iw}{Y-h0Er4NSVuS0XPIskjsHA}{node_t2}{127.0.0.1}{127.0.0.1:13303}{cdfhilmrstw}, {node_t1}{R25HgyJkTVmfr2j7Lv3F_Q}{wm2Tz70qS4yDpR1N8otm_A}{node_t1}{127.0.0.1}{127.0.0.1:13302}{cdfhilmrstw}] from last-known cluster state; node term 1, last-accepted version 4 in term 1; joining [{node_t2}{LBrijWVtTiWZm03kHDE1iw}{Y-h0Er4NSVuS0XPIskjsHA}{node_t2}{127.0.0.1}{127.0.0.1:13303}{cdfhilmrstw}] in term [1] has status [waiting for response] after [1ms]",
                            "node3": "master not discovered or elected yet...",
                            "node2": "master not discovered or elected yet..."
                        }
                    },
                    "impacts": [...],
                    "user_actions": [...]
                }
            }
        },
        "data": {...},
        "snapshot": {...}
    }
}

And case 3:

{
    "status": "red",
    "cluster_name": "TEST-TEST_WORKER_VM=[--not-gradle--]-CLUSTER_SEED=[1888251737628875036]-HASH=[117E5600A0F2DB]-cluster",
    "components": {
        "cluster_coordination": {
            "status": "red",
            "indicators": {
                "master_is_stable": {
                    "status": "red",
                    "summary": "No master node observed in the last 30s, and the master eligible nodes are unable to form a quorum",
                    "help_url": "https://ela.st/fix-master",
                    "details": {
                        "current_master": {
                            "node_id": null,
                            "name": null
                        },
                        "recent_masters": [
                            {
                                "node_id": "9hPCQwPsRrCX9g4DjYywzw",
                                "name": "node_t2"
                            }
                        ],
                        "cluster_formation": {
                            "node2": "master not discovered or elected yet, an election requires a node with id [dcA01mVWTiebGysnbNhwiA], have only discovered non-quorum [{node_t0}{ryn0dLlATV2b8WCyIlIN6A}{pVJwZeLPRqKvqSCO53KQ2Q}{node_t0}{127.0.0.1}{127.0.0.1:13303}{m}]; discovery will continue using [127.0.0.1:13301, 127.0.0.1:13302] from hosts providers and [{node_t0}{ryn0dLlATV2b8WCyIlIN6A}{pVJwZeLPRqKvqSCO53KQ2Q}{node_t0}{127.0.0.1}{127.0.0.1:13303}{m}, {node_t1}{dcA01mVWTiebGysnbNhwiA}{OeAmKbeZQD-kOmKimCDY8Q}{node_t1}{127.0.0.1}{127.0.0.1:13301}{m}, {node_t2}{9hPCQwPsRrCX9g4DjYywzw}{kZWBj1g7QZCNWRSGu9EiVg}{node_t2}{127.0.0.1}{127.0.0.1:13302}{m}] from last-known cluster state; node term 2, last-accepted version 7 in term 1; joining [{node_t1}{dcA01mVWTiebGysnbNhwiA}{OeAmKbeZQD-kOmKimCDY8Q}{node_t1}{127.0.0.1}{127.0.0.1:13301}{m}] in term [2] has status [waiting for response] after [30s/30039ms]",
                            "node1": "master not discovered or elected yet...",
                            "node3": "master not discovered or elected yet..."
                        }
                    },
                    "impacts": [...]
                    "user_actions": [...]
                }
            }
        },
        "data": {...},
        "snapshot": {...}
    }
}

And case 4:

{
    "status": "red",
    "cluster_name": "TEST-TEST_WORKER_VM=[--not-gradle--]-CLUSTER_SEED=[-6341988252367058471]-HASH=[11CC4EF6482AF0]-cluster",
    "components": {
        "cluster_coordination": {
            "status": "red",
            "indicators": {
                "master_is_stable": {
                    "status": "red",
                    "summary": "No master node observed in the last 30s, and the cause has not been determined.",
                    "help_url": "https://ela.st/fix-master",
                    "details": {
                        "current_master": {
                            "node_id": null,
                            "name": null
                        },
                        "recent_masters": [
                            {
                                "node_id": "LBrijWVtTiWZm03kHDE1iw",
                                "name": "node_t2"
                            }
                        ],
                        "cluster_formation": {
                            "node1": "master not discovered or elected yet, an election requires at least 2 nodes with ids from [R25HgyJkTVmfr2j7Lv3F_Q, LBrijWVtTiWZm03kHDE1iw, okA4QNE9R0aTvTCEKPinuA], have discovered possible quorum [{node_t1}{R25HgyJkTVmfr2j7Lv3F_Q}{wm2Tz70qS4yDpR1N8otm_A}{node_t1}{127.0.0.1}{127.0.0.1:13302}{cdfhilmrstw}, {node_t0}{okA4QNE9R0aTvTCEKPinuA}{3nNebXXZQ5CCWd61EM4iww}{node_t0}{127.0.0.1}{127.0.0.1:13301}{cdfhilmrstw}, {node_t2}{LBrijWVtTiWZm03kHDE1iw}{Y-h0Er4NSVuS0XPIskjsHA}{node_t2}{127.0.0.1}{127.0.0.1:13303}{cdfhilmrstw}]; discovery will continue using [127.0.0.1:13301, 127.0.0.1:13303] from hosts providers and [{node_t0}{okA4QNE9R0aTvTCEKPinuA}{3nNebXXZQ5CCWd61EM4iww}{node_t0}{127.0.0.1}{127.0.0.1:13301}{cdfhilmrstw}, {node_t2}{LBrijWVtTiWZm03kHDE1iw}{Y-h0Er4NSVuS0XPIskjsHA}{node_t2}{127.0.0.1}{127.0.0.1:13303}{cdfhilmrstw}, {node_t1}{R25HgyJkTVmfr2j7Lv3F_Q}{wm2Tz70qS4yDpR1N8otm_A}{node_t1}{127.0.0.1}{127.0.0.1:13302}{cdfhilmrstw}] from last-known cluster state; node term 1, last-accepted version 4 in term 1",
                            "node2": "master not discovered or elected yet...",
                            "node3": "master not discovered or elected yet..."
                        }
                    },
                    "impacts": [...],
                    "user_actions": [...]
                }
            }
        },
        "data": {...},
        "snapshot": {...}
    }
}

…blems

…mplexit

elasticmachine · 2022-06-30T21:19:51Z

Pinging @elastic/es-data-management (Team:Data Management)

elasticsearchmachine · 2022-06-30T21:20:12Z

Hi @masseyke, I've created a changelog YAML for you.

andreidan

Thanks for working on this Keith.

I have some questions about the approach we took here.

Also, would you mind trimming the description to contain only the relevant parts ? (ie. master_is_stable indicator, without impacts and the likes - it seems that only summary and details are affected)

andreidan · 2022-07-04T14:06:27Z

server/src/main/java/org/elasticsearch/cluster/coordination/CoordinationDiagnosticsService.java

+        } else if (clusterService.localNode().isMasterNode() == false) { // none is elected master and we aren't master eligible
            // NOTE: The logic in this block will be implemented in a future PR
            result = new CoordinationDiagnosticsResult(
                CoordinationDiagnosticsStatus.RED,
                "No master has been observed recently",
                CoordinationDiagnosticsDetails.EMPTY
            );
+        } else { // none is elected master and we are master eligible
+            result = diagnoseOnHaveNotSeenMasterRecentlyAndWeAreMasterEligible(localMasterHistory, masterEligibleNodes, explain);
+        }


I think this if/else block is becoming hard to follow and reason about
ie. where do we check we aren't master eligibile node? the last else statement has a bunch of implicit decisions that are hard to verify (ie. why are we sure we're master eligible in this case?)

The else if right above the else checks if we're master eligible. I'll try to make it more explicit.

server/src/main/java/org/elasticsearch/cluster/coordination/CoordinationDiagnosticsService.java

andreidan · 2022-07-04T14:26:55Z

server/src/main/java/org/elasticsearch/cluster/coordination/CoordinationDiagnosticsService.java

+     * @param nodeToClusterFormationStateMap A map of each master node to its ClusterFormationState
+     * @return true if there are discovery problems, false otherwise
+     */
+    private boolean hasDiscoveryProblems(


This is a bit ambiguous w.r.t. what it is diagnosing - who has discovery problems?

Maybe we can be more intentional in the method name and also return (or log?) the problems we discover?
ie. who cannot discover which node?

I was originally calculating and returning this, but removed that since we're not going to be putting it in the details in the response. I can change it to log that information for now.

andreidan · 2022-07-04T14:28:18Z

server/src/main/java/org/elasticsearch/cluster/coordination/CoordinationDiagnosticsService.java

+     * @param nodeToClusterFormationStateMap A map of each master node to its ClusterFormationState
+     * @return True if any nodes in nodeToClusterFormationStateMap report a problem forming a quorum, false otherwise.
+     */
+    private boolean hasQuorumProblems(


Same as above - would it be useful to log the problems we discover? Or return them?

Yeah at some point we'll be putting them into the details section of the response. For now I'll log them.

Just to confirm - this means they don't currently add any new information compared to what we provide in the generic cluster_formation.description field. Is that correct?

Right. That was the conclusion to the discussion on the document I shared about this:

W.r.t. structure I don’t think we have a clear indication as to how the details section of the `master_is_stable` indicator is going to be used for now, so I’d suggest we keep the `ClusterFormationState#description` as the only field in the `master_is_stable` details field for now and add structure at a later phase. Once the health API is used in the diagnostics bundle (very soon) we’ll be able to get some more engineers exposed to the `details` field and get some feedback about the needs and shortcomings.

++ I was just curious if we've gained extra information in these diagnostic steps

Thanks for the confirmation

andreidan · 2022-07-04T14:32:36Z

Would we benefit from attaching #87482 (comment) to the meta issue ?

masseyke · 2022-07-05T19:31:58Z

Would we benefit from attaching #87482 (comment) to the meta issue ?

Added at #85624 (comment)

…b.com:masseyke/elasticsearch into feature/health-api-master-stability-discovery

masseyke · 2022-07-12T23:04:15Z

@elasticmachine update branch

…b.com:masseyke/elasticsearch into feature/health-api-master-stability-discovery

andreidan

Thanks for iterating on this Keith

Left one suggestion.

andreidan · 2022-07-25T16:29:43Z

server/src/main/java/org/elasticsearch/cluster/coordination/CoordinationDiagnosticsService.java

+                        + "eligible nodes",
+                    nodeHasMasterLookupTimeframe
+                ),
+                getDetails(explain, localMasterHistory, null, coordinator.getClusterFormationState().getDescription())


I've seen @DaveCTurner use the cluster formation details from all involved nodes.

I think, since we got hang of all the master nodes view on the cluster formation, we should report each node's view under the details section.

What do you think?

Yeah we definitely need to do that. It sounds like we had a miscommunication earlier -- I thought you wanted that information removed so I removed it. I'll find a way to put it back in.

OK if there is a discovery or quorum problem, I am now putting all of the cluster formation descriptions for all master nodes into the details for debugging purposes.

andreidan · 2022-07-25T16:35:53Z

...rc/test/java/org/elasticsearch/cluster/coordination/CoordinationDiagnosticsServiceTests.java

+        assertThat(result.summary(), containsString(" some master eligible nodes are unable to discover other master eligible nodes"));
+    }
+
+    public void testAnyNodeInClusterReportsDiscoveryProblems() {


masseyke · 2022-07-25T19:40:32Z

@elasticmachine run elasticsearch-ci/packaging-tests-windows-sample

masseyke · 2022-07-25T19:54:21Z

@elasticmachine run elasticsearch-ci/bwc

andreidan

Thanks for iterating on this Keith

Just a couple more questions left

server/src/main/java/org/elasticsearch/cluster/coordination/CoordinationDiagnosticsService.java

andreidan · 2022-07-26T14:31:52Z

server/src/main/java/org/elasticsearch/cluster/coordination/CoordinationDiagnosticsService.java

@@ -361,7 +511,8 @@ private CoordinationDiagnosticsResult getResultOnNoMasterEligibleNodes(MasterHis
        CoordinationDiagnosticsDetails details = getDetails(
            explain,
            localMasterHistory,
-            coordinator.getClusterFormationState().getDescription()
+            null,
+            Map.of(coordinator.getLocalNode().getId(), coordinator.getClusterFormationState().getDescription())


Would it be beneficial if the local node's view is expressed separately? Or put it the other way, the remote ones be expressed oner a new object in the representation? remote or something better named?

Also, would we still want to have the cluster formation expressed as objects for future extension? (in case we'll want to add some structure)

ie:

"localNode": { "description" : " issues" }, "remote" : { "node1" : { "description" : "issues" }, "node2": { "description" : "other things"} }

I couldn't really think of any reason, so I put them together to simplify the response. Can you think of a reason it would be useful to have them separate?

I was wondering if in large clusters it'll become a bit difficult to determine the local view amongst a list of remote views? Maybe I'm over-optimising.
Could you please update the PR description with the latest responses and let's move ahead with the simple solution for now (we could iterate afterwards to improve it - once we see it in live cases)

andreidan

LGTM, thanks for implementing this Keith

masseyke · 2022-07-27T14:32:06Z

@elasticmachine run elasticsearch-ci/part-1

Adding logic to master_is_stable indicator to check for discovery pro…

422f723

…blems

elasticsearchmachine added the v8.4.0 label Jun 24, 2022

masseyke added 9 commits June 28, 2022 12:01

Merge branch 'master' into feature/health-api-master-stability-discovery

6e21479

improved indicator summaries and better unit test

01f44a1

cleaning up and unit testing

22c649b

getting rid of unneeded getDiscoveryProblems and getQuorumProblems co…

89a44c9

…mplexit

commenting about nodeToClusterFormationStateOrExceptionMap

5060780

merging master

6ddbf08

spotlessApply

942385b

Adding an integration test for no quorum

342ce84

Merge branch 'master' into feature/health-api-master-stability-discovery

1b633e6

masseyke added :Data Management/Health >enhancement labels Jun 30, 2022

masseyke marked this pull request as ready for review June 30, 2022 21:19

elasticmachine added the Team:Data Management Meta label for data/management team label Jun 30, 2022

Update docs/changelog/88020.yaml

f2c4d3f

masseyke requested a review from andreidan June 30, 2022 21:20

andreidan reviewed Jul 4, 2022

View reviewed changes

code review feedback

30d68a8

masseyke added 2 commits July 5, 2022 14:50

Merge branch 'feature/health-api-master-stability-discovery' of githu…

b9d4852

…b.com:masseyke/elasticsearch into feature/health-api-master-stability-discovery

avoiding forbidden API

940d104

masseyke requested a review from andreidan July 5, 2022 21:34

improving beginPollingClusterFormationInfo()

870f79d

masseyke mentioned this pull request Jul 11, 2022

Polling cluster formation state for master-is-stable health indicator #88397

Merged

more unit testing

fc3dfd3

elasticmachine and others added 7 commits July 13, 2022 08:34

Merge branch 'master' into feature/health-api-master-stability-discovery

462c892

adding unit tests

7455e3d

Merge branch 'feature/health-api-master-stability-discovery' of githu…

d5a7423

…b.com:masseyke/elasticsearch into feature/health-api-master-stability-discovery

merging master

05e83fb

documenting and improving unit tests

f1b17b0

minor cleanup

e078f64

minor doc cleanup

e214e4a

masseyke requested a review from andreidan July 14, 2022 19:15

fixing a unit test

aeef0ed

elasticsearchmachine changed the base branch from master to main July 22, 2022 23:06

andreidan reviewed Jul 25, 2022

View reviewed changes

masseyke added 2 commits July 25, 2022 14:09

Putting cluster formation info into the details

93c6f5f

merging main

8e2f48f

masseyke requested a review from andreidan July 26, 2022 13:29

andreidan reviewed Jul 26, 2022

View reviewed changes

code review feedback

07939f0

andreidan approved these changes Jul 26, 2022

View reviewed changes

masseyke added 2 commits July 27, 2022 08:54

Using node name instead of id

3a0b817

reverting previous commit

2e6e53d

mark-vieira added v8.5.0 and removed v8.4.0 labels Jul 27, 2022

masseyke merged commit 41d7280 into elastic:main Jul 27, 2022

masseyke deleted the feature/health-api-master-stability-discovery branch July 27, 2022 15:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding logic to master_is_stable indicator to check for discovery problems #88020

Adding logic to master_is_stable indicator to check for discovery problems #88020

masseyke commented Jun 24, 2022 •

edited

Loading

elasticmachine commented Jun 30, 2022

elasticsearchmachine commented Jun 30, 2022

andreidan left a comment

andreidan Jul 4, 2022

masseyke Jul 5, 2022

andreidan Jul 4, 2022

masseyke Jul 5, 2022

andreidan Jul 4, 2022

masseyke Jul 5, 2022

andreidan Jul 6, 2022

masseyke Jul 6, 2022

andreidan Jul 6, 2022

andreidan commented Jul 4, 2022

masseyke commented Jul 5, 2022

masseyke commented Jul 12, 2022

andreidan left a comment

andreidan Jul 25, 2022

masseyke Jul 25, 2022

masseyke Jul 25, 2022

andreidan Jul 25, 2022

masseyke commented Jul 25, 2022

masseyke commented Jul 25, 2022

andreidan left a comment

andreidan Jul 26, 2022

andreidan Jul 26, 2022

masseyke Jul 26, 2022

andreidan Jul 26, 2022

andreidan left a comment

masseyke commented Jul 27, 2022

Adding logic to master_is_stable indicator to check for discovery problems #88020

Adding logic to master_is_stable indicator to check for discovery problems #88020

Conversation

masseyke commented Jun 24, 2022 • edited Loading

elasticmachine commented Jun 30, 2022

elasticsearchmachine commented Jun 30, 2022

andreidan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andreidan commented Jul 4, 2022

masseyke commented Jul 5, 2022

masseyke commented Jul 12, 2022

andreidan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

masseyke commented Jul 25, 2022

masseyke commented Jul 25, 2022

andreidan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andreidan left a comment

Choose a reason for hiding this comment

masseyke commented Jul 27, 2022

masseyke commented Jun 24, 2022 •

edited

Loading